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Abstract 

We show an 0*(2 fe )-time polynomial space algorithm for the /c-sized Graph Motif 
problem. We also introduce a new optimization variant of the problem, called Closest 
Graph Motif and solve it within the same time bound. The Closest Graph Motif 
problem encompasses several previously studied optimization variants, like Maximum 
Graph Motif, Min-Substitute, and Min-Add. 

Moreover, we provide a piece of evidence that our result might be essentially tight: 
the existence of an 0((2 — e) fc )-time algorithm for the Graph Motif problem implies an 
0((2 — e')")-time algorithm for Set Cover. 

1 Introduction 

The Graph Motif problem is defined as follows. We are given an undirected graph G = 
(V, E), a vertex coloring c : V — > C, and a multiset M consisting of colors in the set C. The 
goal is to find a subset S C V such that the induced subgraph G[S] is connected, and the 
multiset c(S) of colors of the vertices of S is equal to M. To avoid confusion, let us stress 
that the input function c is arbitrary and it does not need to be a proper vertex coloring. Let 
k = \S\ denote the size of the solution (which is \M\ in the case of Graph Motif but may 
differ from \M\ in variants of the problem also considered in this paper). 

Graph Motif was introduced by Lacroix et al. |15] and motivated by applications in 
bioinformatics, specifically in metabolic network analysis. It is known to be NP-hard even 
when the given graph is a tree of maximum degree 3 and the motif is a set [5]. However, 
in practice the size of M is expected to be small, what motivates the research on so-called 
FPT algorithms parameterized by k, that is, algorithms with running times 0(f(k)n c ), where 
n = \V\ and c is a constant (this is commonly abbreviated by 0*(f(k))). Indeed, the paper 
of Fellows et al. [8] showed that such an algorithm is possible. While the initial algorithm 
was rather impractical because of the very fast increasing function f(k), it was succeeded by 
a series of improvements (see Table [T]) . 

The two most recent results, namely the 0*(4 fc ) algorithm of Guillemot and Sikora [TO] 
and the 0*(2.54 fc )-time algorithm of Koutis [13J apply the approach of multilinear detection 
which was introduced by Koutis [TT], and further developed by Williams [T8], and Koutis and 
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Multilinear detection 
Constrained multilinear detection 
Constrained multilinear detection 



Table 1: The progress on FPT algorithms for the Graph Motif problem 

Williams |14] to solve subgraph containment problems. Inspired by the Graph Motif appli- 
cation and the reduction to multilinear detection provided by [ID], Koutis [13] introduced a 
tailored variant of multilinear detection called constrained multilinear detection (abbreviated 
/c-CMlD) to make further progress on the Graph Motif. In the /c-CMlD problem we are 
asked to determine if an arithmetic circuit that computes a multivariate polynomial has a 
/c-sized odd coemcent multilinear monomial of a certain kind. The variables are colored and 
all colors have a budget, and we want the sought monomial to not violate the budget for 
any color. At a Dagstuhl seminar in 2010, Koutis [12] posed the open problem of devising 
a 0*(2 fc )-time algorithm for &-CMlD. His recent paper [T3] provides a 0*(2.54 fe )-time al- 
gorithm for the problem where the worst case bound results from a budget of at most three 
occurrences of every color. 

In this paper we show an 0*(2 fc )-time polynomial space algorithm for fc-CMiD, thereby 
answering Koutis's open problem in the affirmative. Since the main application of /c-CMlD 
is the Graph Motif problem and its variants, we first present the result directly in terms 
of graph motifs. In the appendix we give a self-contained proof for the general &-CMlD. 
Our approach is much inspired by Koutis's beautiful idea of assigning random subspaces of 
dimension equal to the multiplicities (budgets) of the colors. He used group algebras ^[ZrJ] 
for his construction, whereas ours seem to require the larger F 2 /3[Z2] for some /3 = f2(logfc) to 
work. Rather than proving the result in terms of group algebra as Koutis suggests, we provide 
a construction using inclusion-exclusion over labelled indeterminates. As in [3J, a paper using 
the technique co-authored by a subset of the present authors, we find this alternative easier 
to reason about. 

A further contribution of the present work is to develop a generalization of Graph Motif 
called Closest Graph Motif. In particular, we introduce a notion of edit distance between 
two multisets, and the objective is to find the subset 5cy such that G[S] is connected and 
the edit distance from M to c(S) is minimized (see Section [3J for a precise definition). In 
the literature there are some more optimization variants of Graph Motif and our Clos- 
est Graph Motif generalizes three of them: Maximum Graph Motif (see Section [2]), 
Min-Add and Min-Substitute (see Section [3]). The previous fastest algorithms for these 
problems are due to Koutis [13]; he shows 0*(2.54 fc )-time algorithms for Maximum Graph 
Motif and Min-Add, and a O*(5.08 fc )-time algorithm for Min-Substitute. We present an 
0*(2 fe )-time algorithm for the general Closest Graph Motif. 

Similarly to the three previous FPT improvements on Graph Motif relative to the 
parameter k, our algorithm is Monte Carlo with one-sided error (with probability bounded 
by 2~ k for the event that the algorithm reports the absence of a solution when in fact there 
is one). 

In addition to our algorithmic results, we give a piece of evidence that further improvement 
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on the running time is substantially harder. Namely, we show that for any e > the existence 
of an 0((2 — e) fe )-time algorithm for the Graph Motif problem implies an 0((2 — e') n )-time 
algorithm for Set Cover, for some e' > 0. Thus, instead of trying to improve our algorithm 
one should rather attack the more generic Set Cover. Set Cover is a well-known problem 
researched for decades, what suggests that a 0((2 — e) n )-time algorithm for it, if possible at 
all, would be a major breakthrough in the field. The nonexistence of such an algorithm has 
already been used as an assumption for proving hardness results [4j. In fact it is conjectured [4j 
that a 0((2 — e) n )-time algorithm for Set Cover contradicts the SETH (Strong Exponential 
Time Hypothesis, which states that there is no 0((2 — e) n )-time algorithm for SAT). This 
conjecture is supported by the fact that the number of solutions to Set Cover cannot be 
computed in time 0((2 — e) n ) for any e > unless SETH fails [4j. Another consequence of 
the existence of such a counting algorithm is that then there is also an 0((2 — e') n ) time 
algorithm to compute an integer n x n-matrix permanent for some e' > [2]. 

This paper is organized as follows. In Section [2] we describe our 0*(2 fc )-time algorithm 
for Maximum Graph Motif, while in Section we describe how our algorithm works for 
the more general Closest Graph Motif and how it encompasses earlier named variants. 
In Section [J] we show the reduction from Set Cover. 

2 An 0*(2 fc )-time algorithm for Maximum Graph Motif 
2.1 The general approach 

In this section we will study a slight generalization of Graph Motif, namely the Maximum 
Graph Motif problem parameterized by the solution size k. That is, for a given instance 
(G, c, M), the task is to decide whether there exists a subset S QV such that (i) the induced 
subgraph G[S] is connected, (ii) \S\ = k, and (iii) the multiset inclusion c(S) C M holds. 

It is immediate that once we have an algorithm for the decision problem that runs in 
T(n, k) time, we can find a solution in 0(nT(n, k)) time as follows. For every vertex v 6 V, 
check whether the solution exists if we remove v ; if not then put the vertex back; in both 
cases proceed to the next vertex. After iterating over all vertices, we are left with the desired 
induced subgraph G[S]. 

To solve the decision problem we use the following approach. For every vertex r S V, 
we define a multivariate polynomial P r that has two key properties: (1) P r is not identically 
zero if and only if there is a solution S that contains r, and (2) P r can be evaluated fast 
(that is, in 0*(2 fc ) time) at any given point. Then an 0*(2 fc )-time algorithm follows via the 
DeMillo-Lipton-Schwartz-Zippel Lemma (see Section T2.6p . 

The main difference between the present approach and earlier works that deploy a similar 
polynomial sieve is that we employ the same "labels" (the universe of k elements whose 
subsets we use in sieving) to simultaneously accomplish two different tasks: (i) we sieve out 
all homomorphisms that are not injective and (ii) we sieve out all multisets that use too many 
colors when compared with M. (The A:-CM1D algorithm in the appendix in turn uses the 
same ideas to sieve out (i) non-multilinear monomials and (ii) monomials that contain at least 
one color too many times.) 

The rest of the section is organized as follows. First we give some preliminaries and 
notation on branching walks and labellings in Sections 12.21 and 12.31 In Section 12.41 we define 
the polynomial P r and we prove the property (1) as Lemma [TJ In Section 12.51 we show 



3 



property (2) and finally in Section 12.61 we describe the complete algorithm and analyze its 
failure probability using the DeMillo-Lipton-Schwartz-Zippel Lemma. 

2.2 Preliminaries on branching walks 

The concept of branching walks was first introduced by Nederlof [16J to sieve for Steiner 
trees. We define them as follows. Let G be a graph with vertex set V = V(G) and edge set 
E = E{G). A mapping h : V(T) — > V(G) is a homomorphism from a graph T to a graph G 
if for all {u, v} G E(T) it holds that {h(u), h(v)} G E(G). We adopt the convention of calling 
the elements of V(T) nodes and the elements of V(G) vertices. 

A branching walk of length I in G is a pair W = (T, h) where T is an ordered rooted tree 
with node set V(T) = {1, 2, . . . , 1} such that every node v G V(T) coincides with its rank in 
the preorder traversal of T, and h : V(T) — > V(G) is a homomorphism from T to G. For a 
vertex r € V, we say that W starts from r if /i(l) = r. 

Let W = (T, h) be a branching walk in G. We define /i(T) to be the subgraph of G induced 
by the set of edges {{h(u),h(v)} : {u, v} G E(T)}. We observe that h(T) is not necessarily a 
tree because h need not be injective. We say that W is simple if h(T) is injective. 

It will be convenient to assume that V(G) is totally ordered. Towards this end, let us 
assume that V = V(G) = {1, 2, . . . , n}. We say that a branching walk W = (T, h) in G is 
properly ordered if any two sibling nodes u < v in T satisfy h(u) < h(v). 

2.3 Labelling and shading 

Let (G, c, M) be the input instance of the Maximum Graph Motif problem and let m : C -t 
N be the multiplicity function for M. For each color g£C and % = 1, 2, . . . , m(q), let us call the 
formal pair (g, i) the i-t/i shade of color q. In particular, the number of shades for each color 
matches the multiplicity of the color in M. Let us write D{q) = {(q, i) : z = 1,2,..., m(q)} 
for the set of all shades of color q £ C, and D = U gg c , Z)(g) for the set of all shades of all 
colors. 

A branching walk (T, h) in G may be labelled with a function I : V(T) — > {1, 2, . . . , k}. 
The three-tuple (T, h, i) is called a labelled branching walk, and the function I is a labelling of 
the branching walk. 

A branching walk (T,h) in G may also be shaded with a function s : V(T) —> D. A 
shading may also be partial, that is, of the form s : U — >■ D for a subset U C V(T). We say 
that a (partial) shading s : f7 — > D of a branching walk (T, /i) is consistent with the input 
coloring c : V(G) — >■ C if for every node w € U we have € D{c{h{v))). For a branching 
walk = (T, /i) and a subset £/ C V(T), denote by §>u(W) the set of all consistent (partial) 
shadings s : U — ^ D. Let us abbreviate §>(W) = §>v(T)(W). 

2.4 The polynomials P r 

For r G denote by W r the set of all branching walks W = (T,h) in G that are (i) 

properly ordered, (ii) start from r, and (iii) satisfy |V(T)| = k. 

We use three different types of indeterminates in our polynomials. First, for each edge 
{a, b} £ E(G) with a < b, introduce an indeterminate x a ,b- Second, for each vertex a £ V(G) 
and each shade t G D(c(a)), introduce an indeterminate y a t- Third, for each shade t G D 
and each label j G {1, 2, . . . , A;}, introduce an indeterminate Ztj- Let us write x, y, z for the 
sequences of all the x ai 6-type, y a ^-type, and ^.j-type variables, respectively. 
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Let W = (T, h) be a branching walk in G, let s : V(T) — > D be a consistent shading of 
and let £ : V(T) — >• {1, 2, . . . , k} be a labelling of W. Associate with the consistently shaded 
and labelled branching walk (W,s,£) the monomial 

mon(W,s,£) = JJ x h(u)Mv) ]J yh{v),s(v)Z s {v),Z(v) ■ 
{u,v}£E(T) veV(T) 

Let f} = [log k] + 3 and denote by F 2 /3 the finite field of order 2^. Define the multivariate 
polynomial P r with coefficients in F 2 /3 by setting 

P r (x,y,z)= E E E mon(W; a ^). 

W=(T,fe)eW r sGS(W) e-.V(T)-y{l,2,...,k} 

i bijective 

Lemma 1. We have P r ^ if and only if there exists a solution S C V(G) with r £ S. 

Proof. (<=) Let Ts be a spanning tree of G[S}. Transform T5 into a rooted tree with root 
r £ S, and make T$ an ordered tree so that the children of every vertex listed in tree order 
form an increasing sequence. If we replace every vertex in T5 with its rank in a preorder 
traversal, we obtain a properly ordered simple branching walk W = (T, h) starting at r. 
Define the shading s : V(T) — > D for each node v G V(T) by setting 

s( v ) = (c(h(v)), \{w G S : c(h(w)) = c{h{v)) and w < v}\) . 

Note that s is well-defined and consistent because S is a solution. Furthermore, observe that 
s is injective. Finally, choose an arbitrary bijection £ : V{T) — > {1,2,..., k}. We must now 
have P r ^ because we can uniquely reconstruct any three-tuple (W, s, I) with a simple 
W = (T, /i) e W r , an injective s G S(W) and a bijective £ : V(T) ->■ {1,2, ... ,fe} from its 
monomial representation mon(PF, s,£) - indeed, first recover W = (T,h) from the x ai b-type 
indeterminates using the fact W is simple, properly ordered, and starts from r; then recover 
s from the y a ,<-type indeterminates using the fact that h is injective; finally recover i from 
the z S j-type indeterminates using the fact that s is injective. 

(=>) Since P r ^ there is a branching walk W = (T, h) G W r , a shading s G §(W) and a 
bijective labelling I : V(T) — > {1, 2, . . . , &} such that the monomial mon(W, s, £) in P r has a 
nonzero coefficient. We must derive a solution 5 from (W,s,£). 

Let us first show that mon(VF, s,£) has zero coefficient in P r unless h is injective. Suppose 
that h is not injective; that is, h(uo) = h(vo) for some distinct nodes no, vo G V(T). If there are 
many such pairs, take the pair with lexicographically minimal value of (mm{£(u),£(v)}, max{£(n), £(v)}); 
this pair is unique since £ is injective. Define £' : V(T) — > {1, 2, . . . , k} for all v G V(T) by 



v ) ifv = u , 
u ) if v = v , 
^£(v) otherwise. 



£'{v) = < 

Similarly, define s' : V(T) -> D for all v G V(T) by 

s'(v) 



' s(v ) if v = n , 
s(n ) if v = no, 
k s(n) otherwise. 
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We observe that £' is bijective and that £' ^ I because I is bijective. Moreover, s' is con- 
sistent because c(h(uo)) = c(h(vo)) and s is consistent. In this way, to the triple (W,s,£) 
we associated a different triple (W,s',£') such that mon(W, s,£) = mon(W, s',£'). Since F 2 /s 
has characteristic 2, these two monomials cancel out in P r . Conversely, if we begin from 
(W,s',£ f ) and follow the same association rule, we get (W,s,£). Hence, the set of all triples 
((T,h),s,£) in which the homomorphism h is not injective is partitioned into pairs, and the 
two monomials corresponding to each pair cancel out. It follows that the homomorphism h is 
injective in every triple ((T, h),s,£) in the preimage of a monomial with a nonzero coefficient 
in P r . 

Next suppose that h is injective but s is not injective. Then there is pair of distinct nodes 
no, wo £ that have the same shade s(uq) = s(vq). Again, if there are many such pairs 

we take the pair with lexicographically minimal value of (min{^(it), £{v)}, ma,x{£(u), £{v)}); 
this pair is unique since £ is injective. Define £' : V(T) — s> {1,2, ... ,k} as before. Again, 
to the triple (W,s,£) we assigned a different triple (W,s,£') and again one can verify that 
mon(W, s, £) = mon(W, s' , £'). By a similar argument as above, we see that the two monomials 
corresponding to these two triples cancel out. It follows that the shading s is injective in every 
triple ((T, h),s,£) in the preimage of a monomial with a nonzero coefficient in P r . 

So we must have that both h and s are injective in a three-tuple ((T,h),s,£) where the 
monomial mon(W, s,£) in P r has a nonzero coefficient. Let S = V(h(T)). Because h is 
injective, \S\ = k. Because T is connected and h is a homomorphism, we have that h(T) and 
hence G[S] is connected. Since s is consistent and injective, S is a solution. □ 

2.5 Evaluating the polynomials in time 0*(2 k ) 

In this section we show that for every vertex r G V the polynomial P r can be evaluated in a 
given point (x, y, z) in time 0*(2 k ). To this end, let us rewrite P r as a sum of 2 k polynomials 
such that each of them can be evaluated in time polynomial in the input size. For each 
X C {1,2,..., A:}, let 

P£(x,y,z)= E E E mon(W;M), 

w=(T,h)ew r se§(w) e-.v(T)->x 

Note that we do not assume that the labellings in the third summation are bijective. 
Lemma 2. P r (x,y,z) = E p x( x >y> z )- 

XC{l,2,...,fe} 

Proof. Let us fix a branching walk W = (T,h) € W r such that |V(T)| = k and a shading 
s G S(W). Because |V(T)| = k, a function £ : V(T) -»• {1, 2, . . . , k} is bijective if and only if 
it is surjective, so 

E mon(W,s,£)= E mon(W,s,£). (1) 

^y(T)-+{l,2,...,fe} i:V(T)->{l,2,...,k} 
i bijective t surjective 

Observing again that F 2 /3 has characteristic 2, and hence —1 = 1, we have, by the Principle 
of Inclusion and Exclusion, 

E mon(W,s,£)= E E mon(W,s,f). (2) 

e-.V(T)->{l,2,...,k} XC{l,2,...,fc} l:V(T)^X 

t surjective 
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From (pQ) and ([2]) we immediately obtain 

P r (x,y,z)= E E E E nion(W )S ,^). (3) 

W=(T,h)eW r s£S(W) XC{l,2,...,fc} e-.v(T)^x 

The claim follows by changing the order of summation. □ 

Now we are left with a tedious job of evaluating P x (x,y, z) in polynomial time. This is 
slightly technical because we consider properly ordered branching walks. 

Lemma 3. Given a nonempty subset X C {1,2,... , k} and three vectors x, y,z of values 
in F 2 /3 as input, the values of P x (x,y,z) for each r £ V(G) can be computed by dynamic 
programming in time 0(n 2 k 2 /3 2 ) and space 0{n 2 kfi 2 ). 

Proof. Recall that we assume that V{G) = {1, 2, . . . , n}. For a vertex a £ V(G), denote the 
ordered sequence of neighbors of a in G by a\ < a 2 < • • • < adeg G (a)- F° r each a £ V(G), 
1 < i < n, and < I < k, denote by W(a, i, I) the set of properly ordered branching walks 
W = (T, h) such that (i) W starts from a, (ii) for any child node u of 1 in T it holds that 
h(u) = aj implies j > i, and (hi) = I. 

Our objective is to compute a three-dimensional array Ax whose entries are defined by 

A x [a,i,l]= E E E 

W=(T,h)€W{a,i,l) s€S v(T) \ {1) (W) £:V(T)\{1}^X 

I J x h(u),h(v) J ! yh(y),s(v) z s(v),£(v) ■ 
{u,v}£E(T) v<EV(T)\{l} 

The entries of A x admit the following recurrence. For i > deg G (a) or / < 1, we have 



A x [a,i,l] = 

For 1 < i < deg G (a) and 2 < I < k, we have 
Ax[a,i,l] = A x [a,i + l,l} + 



1 if 1 = 1, 
otherwise. 



E E^'*^')' E Ax[a,i + l,h]-A x [ai,l,l 2 )- (4) 

.teD{c( ai )) jex J h+h=i 

h,h>± 

To see that the recurrence is correct, observe that the two lines above correspond to properly 
ordered branching walks in W(a, i, I) where either (a) there is no child node u of 1 in T such 
that h(u) = ai or (b) there is a unique such child. (At most one such child may exist because 
the branching walk is properly ordered.) 

To recover the value of the polynomial Pxi x > Y> z )> we observe that 

P'xi^ y, z ) = E E y^tzt,] ■ A x[r, 1, k] . (5) 
teD(c{r)) jex 

The time bound follows by noting that the values of YlteDMa)) Sjgx Va,tZt,j can be com- 
puted in 0{knj3 2 ) time before filling the table Ax and stored to accelerate the computations 
for the individual entries of Ax ■ □ 
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2.6 The decision algorithm 

Lemma 4 (DeMillo and Lipton [5j, Schwartz [T7], Zippel [E]). Let P{x\, X2, ■ ■ ■ , x m ) be a 

nonzero polynomial of degree at most d over a field F and let S be a finite subset of F. Then, 
the probability that P evaluates to zero on a random element (01, 02, ... , a m ) G S m is bounded 
byd/\S\. 

Theorem 5. The Maximum Graph Motif problem admits a Monte Carlo algorithm that 
runs in 0(2 k n 3 k 2 (3 2 ) time and in polynomial space, with the following guarantees: (i) the 
algorithm always returns NO when given a NO-instance as input, (ii) the algortithm returns 
YES with probability at least 1 — 2~ k when given a YES-instance as input. 

Proof. The algorithm is as follows. Iterate over each vertex r G V(G) in turn. Select values for 
the variables in x, y , z independently and uniformly at random from F 2 /s . Then iterate over all 
X C {1, 2, . . . , k} and use the algorithm in Lemma[3]to evaluate the value -Pjf(x, y, z). Accu- 
mulate the sum of the values Pj^x, y> z ) *° obtain P r (x, y, z) by Lemma El If -P r (x, y, z) 7^ 0, 
answer YES and stop, otherwise proceed to consider the next vertex r. If all vertices have 
been considered, answer NO and stop. 

When the input is a NO instance, the polynomial P r is the zero polynomial by Lemma[TJ so 
the algorithm returns NO. When the input is a YES instance, let S C V(G) be an arbitrary 
solution. It holds by Lemma [T] that the polynomial P r is nonzero for every r G S. In 
particular, the algorithm answers NO only if for every r E S the polynomial P r evaluated 
to at a random point (x, y,z). The degree of P r is exactly 3k — 1, while the size of F 2 p is 
2 ri°s A=l+2 y Thus, by Lemma U] the probability that for a single r 6 S the polynomial P r 
evaluated to is bounded by < \. It follows that the probability that for every r € S 
the polynomial P r evaluates to is at most 2~~ k . □ 

3 Variants of the Graph Motif Problem 

In Section El we described an algorithm for Maximum Graph Motif. It is easy to see 
that the algorithm can also be used to solve classical Graph Motif by setting k = \M\. 
Another variant of the problem studied in the literature is the list version of the problem, 
where every vertex a G V[G) is assigned a set of colors C(a) C C, not just one color c(a), 
and for every vertex in a solution we can choose any of its colors to match the multiset 
M. It is straightforward to modify our algorithm for Maximum Graph Motif to solve 
the list version: in the dynamic programming of Lemma [3l in (jU) instead of summing over 
t € D(c(di)) we sum over t G U^gc^) ^(<?)> an d similarly in 

Although Maximum Graph Motif (introduced in is a natural optimization version 
of the problem, it is not the only one. Two more optimization variants were introduced in [7]; 
we describe their decision versions below. 
Min-Add 

Input: Graph G = (V, E), a coloring c : V — > C, a multiset of colors M, and d G N. 
Question: Is there a subset Scy such that G[S] is connected, M C c(5) and \c(S) \ 
Ml < dl 



8 



Min- Substitute 

Input: Graph G = (V, E), a coloring c : V — > C, a multiset of colors M, and d £ N. 

Question: Is there a subset 5cy such that G[S] is connected and c(S) can be obtained 

from M with at most d substitutions? 
In this paper we introduce a new variant, which is a generalization of Maximum Graph 
Motif, Min- Add and Min-Substitute. We believe that it might be useful in bioinformatics 
applications. 

Consider the following three operations on a multiset M over a set of colors C: 

1. insertion (I): adds one copy of c € C to M, 

2. deletion (D): removes one copy of c € M from M, 

3. substitution (S): removes one copy of c\ € M from M and adds one copy of C2 € C to 
M. 

Associate with each of the three operations a nonnegative integer cost kj, kd, ks- Consider a 
sequence a of the three operations applied to a multiset M. Let mi,mD,ms be the numbers 
of insertions, deletions, and substitutions in a. Then the cost of a is defined as miKi + mD/«D + 
m-s^s- Moreover, for two multisets M and M', the weighted edit distance is defined as the 
minimum cost k(M, M') of a sequence of operations that turns M into M' . 

Closest Graph Motif 

Input: Graph G = (V,E), a coloring c : V — > C, a multiset of colors M, and numbers 
d, ki, kd,ks <E N. 

Question: Is there a subset S C V such that GfS] is connected and n(M,c(S)) < d? 

Note that Min Add reduces to Closest Graph Motif just by putting ki = 1, kd = 
«S = d + 1. Similarly, for Min Substitute put ks = 1, «i = kd = d + 1. 

We next describe an algorithm for Closest Graph Motif subject to the parameteri- 
zation that we are given an additional integer k £ N as input and the subset S must satisfy 
\S\ = k. We will present an 0*(2 fc )-time polynomial space algorithm, assuming that d is 
bounded by a polynomial function in n. Note that when parameterized by the edit distance 
(which also seems natural) the problem is unlikely to admit an FPT algorithm since Graph 
Motif is NP-hard. Since the algorithm is a rather straightforward extension of the algo- 
rithm for Maximum Graph Motif presented in Section [21 we only sketch it by describing 
the modifications needed to handle the more general problem. 

We proceed to define an analog of the polynomial P r from Section [2j We use the same 
indeterminates as before, with additional indeterminates for tracking substitutions and the 
edit distance. Towards this end, for each a £ V(G), introduce the indeterminate w a . Denote 
by w the sequence of all such indeterminates. Introduce one further indeterminate r] for 
tracking the edit distance. 

Recall that in the polynomial P r , every monomial corresponds to a consistently shaded 
and bijectively labelled branching walk (W,s,£). In the new polynomial Q r , every monomial 
corresponds to a quadruple (W, f, s,£), where / : V(T) — > {0, 1} is an indicator function for 
substitutions. That is, f(a) = 1 and s(v) = (q,i) for a node v means that a copy of color q 
(the copy corresponding to the ith shade of q) is substituted by the color c(h(v)). We need 
to also modify the notion of a consistent coloring in order to make it accept insertions and 
substitutions. For a color q G C let D'(q) = {(q,i) ■ i = 1,2,... ,m(q) + |_^/ K lJ} an d let 
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D 1 = U g ec D'(q). In Q r the shading function s maps V(T) to D' . The meaning of this 
modification is that the shade (q, i) for i > m(q) corresponds to an inserted shade of color 
q. Accordingly, the notion of consistency of a shading requires a modification. We say that 
a (partial) shading s : U — > D' of a branching walk W = (T, h) and a substitution indicator 
/ : V(T) — > {0, 1} is consistent if for every node v € U C V(T) one of the following conditions 
holds: (i) if f(v) = then s(v) = (c(h(v)),i) for some i, or (ii) if f(v) = 1 then s(v) £ D. 
For a given branching walk W = (T,h), a substitution indicator / : V(T) — > {0, 1}, and a 
set J7 C V(T), denote by Sf/(W, f) the set of all (partial) shadings s : U — > D' of W that are 
consistent. Let us abbreviate §>(W, /) = §y(r)(W)/)- 

The following lemma will be useful in the construction of Q r . 

Lemma 6. Consider a sequence a of mi insertions, mj substitutions and a number of dele- 
tions which transforms a multiset M into a multiset M' of size k. Then, the cost of a is equal 
to mi(Ki + kd) + m S Ks + (\M\ - k)nD- 

Proof. The claim follows since clearly a contains \M\ + mi — k deletions. □ 

By the above lemma, given a quadruple (W, f, s,£) the cost of the sequence of operations 
corresponding to this quadruple is 

/ ld/«iJ \ 

\ q ec i=l / 

ir 1 (i)i« s + 

(|M| - k)K B . 

For a substitution indicator function / : V(T) —¥ {0, 1} and a homomorphism h : V(T) — > 

V(G), let us write = n«gy(T) w h{u) ^ or * ne indicator monomial of / given h. 
Now we are ready to define the polynomial 

Q r (x,y,z,w,7 7 )= E E E E mon(^M)w^^ s >. 

W=(T,h)£W r /:V(T)->{0,1} seS(W,/) £:V(T)->{l,2,...,fc} 

bijective 

Lemma 7. Lei Q r (x, y, z, w, r?) = Yli>o Ql ( x ' y> z ' w) 7 / 2 • Then, we have Q\ ^ /or an i < d 
if and only if there exists a solution S C V(G) iwzf/i r € S and k(M, c(S)) < <i. 

Proof. Analogous to the proof of Lemma dj with the following modifications to handle the 
substitution indicator /. 

Recover h as before, use the editing sequence for i = k(M,c(S)) < d to construct 
a substitution indicator /, then define s and t. To conclude that Q\ =jt 0, reconstruct a 
quadruple (W, s, /, £) from its monomial representation as before but with the additional 
observation that when h is injective h we can recover / from w^. 

(=>) When /i is not injective, define /' from / by transposing the images of uq and 
under /. When h is injective but s is not injective, proceed as before. Construct the solution 
S as before. From / and s we can read off a sequence of edits to transform M to c(S) with 
cost i < d. □ 
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It is immediate that once we can evaluate polynomials Q r in a given vector (x, y , z, w, rj) we 
can test whether the polynomials Q\ are nonzero using the DeMillo-Lipton-Schwartz-Zippel 
Lemma and Langrange interpolation. By Lemma [21 the task of evaluating Q r in time 0*(2 k ) 
boils down to evaluating Q r x in polynomial time (where Q r x is defined analogously as P x )- 
The evaluation proceeds as in Lemma El with minor changes to the dynamic programming 
recurrence. In particular, in (j4]) we change the expression YlseD(c(ai)) Sjex Vai,sZs,j into 

m(c(ai))+[d/ Ki\ 

^2ya l ,sZ s ,j+ Yl S 2/ai,(c(a i ),p)^(c(ai) )P ),j?7 Kl+KD + X) H Voi^ajW^ . 

seD(c( ai ))j£X p=m(c(ai))+l jeX s£Dj£X 

The three summands correspond to the three possibilities: (i) ai just gets a color from M, (ii) 
Oj gets a new copy of the color c(aj) that is inserted into M, (iii) one copy of a color from M 
is substituted by one copy of the color c(eij). A similar change is required for the expression 
([5]), including also multiplication of the whole expression by fj(\ M \- k ) K D ^ (Alternatively, one 
may offset the final edit distance by (\M\ — k)nn-) 

We conclude with the following theorem which follows from considerations in this section. 
The additional factor of nd in the running time is caused by the use of Lagrange interpolation, 
which requires 0{nd) evaluations of Q r . Note that the degree of w in Q r is bounded by 
{k + |M|)(ki + kd + «s) = 0(nd) because we can assume kj, Kb, k>s < d + 1- We should also 
note that to get the bound on the error probability as before, the size of the finite field F 2/ 3 
should be extended by a factor of Vt(nd). 

Theorem 8. The Closest Graph Motif problem admits a Monte Carlo algorithm that 
runs in 0{2 k n i k 2 df3 2 ) time and in polynomial space, with the following guarantees: (i) the 
algorithm always returns NO when given a NO-instance as input, (ii) the algortihtm returns 
YES with probability at least 1 — 2~ k when given a YES-instance as input. 

4 A reduction from Set Cover 

In the Set Cover problem we are given an integer t and a family of sets § = {Si, S2, . . . , S m } 
over the universe U = Uj=i &j with n = \U\. The task is to determine whether there is a 
subfamily of t sets S^ , Sj 2 , . . . , Si t such that U = Uj=i Sij ■ 

Cygan et al. proved the following result (see Theorem 4.4 in [4_0). 

Theorem 9 (Cygan et al. [4]). 7/ Set Cover can be solved in 0((2 — e) n+t ) time for some 
e > then it can also be solved in 0{{2 — e') n ) time, for some e' > 0. 

We use Theorem [9] to show the following. 

Theorem 10. If Graph Motif can be solved in 0((2 — e) fc ) time for some e > then Set 
Cover can be solved in 0{{2 — e') n ) time, for some e' > 0. Moreover, this holds even when 
we consider Graph Motif restricted to one of the following two extreme cases: 

(i) M is a set, 

(ii) M has only two distinct colors. 

1 Actually Theorem 4.4 is stated in a slightly different way, taking into account the maximum size of sets 
Si, but Theorem [9] follows immediately from their proof. 
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Proof. Let (§, t) be an instance of Set Cover. We are going to show a polynomial-time 
reduction to Graph Motif so that in the resulting instance (G, c, M) the multiset M has 
cardinality n + t + 1. Clearly, combined with Theorem [9l this will prove our claim. 

Graph G = (V, E) is defined as follows. The vertex set consists of U, t copies of the 
family S and a special vertex r, that is, V = U U {s\ : i = 1, 2, . . . , m, j = 1, 2, . . . , t} U {r}. 
Moreover, E = {es{ : e G Si} U {rs{ : i = 1, 2, . . . , m, j = 1, 2, . . . , t}. 

To establish case (i), let M = {1, 2, . . . , n + 1 + 1}. Moreover we put c(s^) = j for every 
i = 1, 2, . . . , m, j = 1, 2, . . . , t. Further, c(r) = t + 1. The n colors i + 2, t + 3, . . . , n + i + 1 
are assigned bijectively to the vertices from U. Now we show that (S,t) is a YES-instance of 
Set Cover iff (G,c,M) is a YES-instance of Graph Motif. Assume , Sj 2 , . . . ,5^ is a 
solution to Set Cover. Then let S = {r} U U U {s^ : j = 1, 2, . . . , t}. It is clear that the 

multiset of colors on S matches M. Obviously, G[{r} U {s\. : j = 1,2, ... ,t}] is connected. 

Since for every e G U there is j = 1,2, ... ,t such that e G <Sj., so es\. G £?(G[S']). It follows 
that GfS 1 ] is connected, and hence S is a solution for Graph Motif. Conversely, if S is a 
solution for Graph Motif in (G,c,M) then for every j = 1,2,... ,t there is exactly one 
ij G {1,2, .. . ,m} such that s\. G S, since the colors of S match M. Moreover, since G[S] 

is connected we infer that for every e G U there is j = 1,2, ... , i such that es^. G 
However, then e G Sj- and it follows that S^, 2, . . . , Sj t is a solution for Set Cover. 

To establish case (ii), let M consist of n + 1 copies of color 1 and t copies of color 2. We 
put c(r) = 1 and c(e) = 1 for every e G U. All the remaining vertices are colored with 2. The 
equivalence can be shown very similarly to the case (i), we skip the details. □ 
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APPENDIX 



A Constrained multilinear detection 

Let C be an arithmetic circuit that represents a multivariate polynomial P(x±,X2, • • • , x n ) in 
the indeterminates X = {xi, x 2 , ■ ■ ■ , x n } over a finite field F 2 /3 of order 2^. Associate with each 
i G {1, 2, . . . , n} a /me in a set of hues H . For each hue u G i?, let m(u) be the multiplicity 
of hue u. The k- constrained multilinear detection problem asks us to decide whether there 
exists a K C {1, 2, . . . ,nf such that (i) = A;, (ii) Ilieif x * * s a monomial of P(xi, X2, • • • , x n ) 
with a nonzero coefhcient, and (iii) for all u G H it holds that \h~ l {u) C\ K\ < m(u). 

Let us assume that C has 5 gates, each of fan-in 2 and unbounded fan-out, where each 
gate may be of one of the following types: an addition gate, a negation gate, a multiplication 
gate, a constant gate, or an input gate. Each constant gate evaluates to a fixed value in F 2 ^ , 
and each input gate evaluates to one of the indeterminates in X. Without loss of generality, 
by embedding F 2 ,s into a larger field of characteristic 2 as necessary, we can assume that 
|F| = 2^ > 4k. Let m = max n6 # m(u). 

Theorem 11. There exists a randomized algorithm that solves k- constrained multilinear de- 
tection in time 0(2 k gk 2 (3 2 -\-nmk(3 2 ). The algorithm gives a correct output on a YES-instance 
with probability at least 1/2 and on a NO-instance with probability 1. 

The following subsections give a proof of Theorem [TT1 We start by describing the algorithm 
and then establish its correctness. 

A.l Algorithm 

Let L be a set of k distinct labels. For each hue u € H, let S(u) be a set of m(u) distinct 
saturations of the hue u. For each i G {1, 2, . . . , n} and s € S(h(i)), introduce an indeterminate 
Zi S . For each u G H, s G S(u), and I G L, introduce an indeterminate w us ^. Let r be a yet 
further indeterminate. 

Execute the following algorithm. First assign values to all the indeterminates Zi yS and w u s> i 
independently and uniformly at random from the field F 2 ^. Then, for each i G {1,2, . . . ,n} 
and t G L, compute 

Vi,l = ^2 z hsWh(i),s,l r > (6) 
seS(h(i)) 

and observe that each such expression is a monomial in r over F 2 ^. Next, iterate over all 
ACL and evaluate the circuit C with inputs 

Xi,A = ^2 ( 7 ) 
eeL\A 

for i = 1,2, ... ,n. Accumulate the sum of the evaluations to obtain 



Qi r ) = ^2 P ( X M> X 2,A, • ■ • , x u ,a) , 

ACL 



(8) 



and observe that Q(r) = Y^j Qj r:1 is a polynomial in the indeterminate r with coefficients 

Terminate the algorithm by giving the output YES if qk 7^ 0; otherwise terminate the 
algorithm by giving the output NO. 



A. 2 Correctness 

The algorithm runs in time given in Theorem 1111 if we truncate all computations with poly- 
nomials in r to monomials of degree at most k. To establish correctness of the algorithm, we 
must show that (i) when C is a YES-instance we have qk ^ with probability at least 1/2, 
and (ii) when C is a NO-instance, qk = with probability 1. 

By linearity it suffices to consider the contribution of one monomial of P{x\,X2, ■ ■ ■ ,x n ) 
to the coefficient qk- Let x^x^ 2 • • • x^ 1 be an arbitrary monomial of P{x\, x 2 , • • • , x n ) with a 
nonzero coefficient. From ([8]), (J7|), and ([6]) we have that X~y X 2 ' ' ' X t n does not contribute to 
qk unless d\ + d 2 + . . . + d n = k. In what follows we may thus assume d\ + ri 2 + • • • + d n = k. 

Non-bijective labelings cancel. Before proceeding further it will be convenient to 
introduce a notational shorthand for an n-tuple (/1, / 2 , . . . , f n ) of functions 

/i:{l ) 2,...,di}->L, / 2 :{l,2,...,d 2 }^L, f n : {1, 2, . . . ,d n } — > L . 

Namely, we write fx U / 2 U • • • U f n = L to indicate that for every I € L there must exist an 
fi and a j € {1, 2, ... , di} such that fi(j) = I. Because d\ + d 2 + • • • + d n = k and \L\ = k, it 
holds that each function fi is injective if /1 U / 2 U • • ■ U f n = L. 

By linearity, flSJ), and ([7]), the contribution of „ to % is exactly 



X 1,A X 2,A X n,A — / 4 II ■ 



ft 

ACL ACLi=l 

=En( e f« 

ACL i=l £gL\A 



En e n 

ACLi=l f i :{l,2,...,d i }-^L\Aj=l 
n di 

E E 1111/ ^ (9) 

AQL /i:{l,2,...,di}-s.L\A i=l j=l 
f 2 :{l,2,...,d 2 }^L\A 

f n :{l,2,-'dn}->L\A 
n di 

e n n vijiW ■ 

/i:{l,2,...,di}->L i=l j=l 
/ 2 :{l,2,...,d 2 }->L 



/„:{l,2,...,d n }->-L 
/lU/ 2 U-U/ n =L 



Here the last equality follows by the principle of inclusion and inclusion - indeed, if we do 
not have /1 U f% U • • • U f n = L for the expression Yli=i TYj=i 2/i,/i0')> then the sum over ACL 



will cancel this expression over F 2/ 3 because it occurs an even number of times in the sum and 
F 2 /3 has characteristic 2. 

Non-multilinear monomials cancel. Our next objective is to show that the last sum 
in @ is zero if there is at least one i = 1, 2, . . . , n with di > 2. Suppose this is the case and 
let iq be the least such value i. We proceed to define a fixed-point-free involution in the space 
of mappings (/i, f 2 , . . . , /„) with f x U / 2 U . . . U /„ = L. 

Because fi is injective, we can define f- ^ fi by setting 



fLU) 



A, (2) if J = 1, 

/<o(l) if J = 2 > ( 10 ) 
./io(i) otherwise. 



Similarly, define f[ = fi for all z 7^ zq. Now observe that we have (i) (/{, f%, ■ ■ ■ , f^) 7^ 
(/l, A, • • • , /n), (h) (/(', •..,/£) = A, • • • , fn), and (iii) /( U f % U . . . U f n = L. Thus, 
/ 1— > f is a fixed-point-free involution. Furthermore, from (|10p we have 

n n Vi,m = i/i ,/ j0 (i)i/<o,/ j0 (2) n n n 

i=lj"=l j=l i=l i=l 

= ^,^(2)^0,4(1) n ^0,400 nn^ffl 

rf »0 n di 

=^0,^(1)^0,4(2) n ^o,4 (i) n Hvi,m 

j=X i=l j=l 

n di 
II II / / 

i=U=i 

It follows immediately that ([9]) evaluates to zero if there is at least one i such that di > 2. 
Thus, in what follows we may assume that di G {0,1} for all i E {1,2, ...,n} and that 
d\ + d 2 + • • • + d n = k. 

Monomials with too large hue multiplicities cancel. Next let us streamline notation 
a little bit by setting K = {i € {1, 2, . . . , n} : di = 1} in ([9]) and then use the substitution ([6]) 



to obtain 

n di 

/i:{l,2,...,di}->-L i=Xj=\ f:K—*L i£K 

/ 2 :{1,2,— ,da}->i / bijective 



/„:{l,2,...,d„}->i 
/iU/ 2 U-U/„=L 



/ x (11) 

= ^ n( ^ *m w m<w«) 

/ bijective 

= 5^ E II z i,g(i) W Hi)Mi),f(i) 

f:K->L g:K^U i€K S(h(i)) i&K 

/ bijective g (i) e S(h{if) for alii 6 K 

Now let us say that g is polychromatic if for all i\, %i € if with ii 7^ ^2 it holds that <?(ii) 7^ 5(^2) 
whenever /t(ii) = h{iy). That is, identical hue implies distinct saturation for i\ and Z2- 
Changing the order of summation in (|lip . we claim that the inner sum in 

S X] II z i,9(i) w W),9(i),m (I 2 ) 

G for all t 6 if / bijective 

evaluates to zero if g is not polychromatic. So assume that go is not polychromatic, and let 
i\ < 12 be the lexicographically least pair such that go(h) = 50(^2) and h(i\) = h{i2). Let 
us define a fixed-point-free involution on the space of all bijections / : K — > L by defining 
/' : K -)• L for alH G K by 

' f{h) ifi = ii, 
f'(i) = lf(h) if * = *2, (13) 
otherwise. 

We observe that /' is bijective, that /' 7^ /, and that /" = /. Thus, / 1— > f is a fixed-point- 



free involution. Furthermore, from (|13p it is immediate that 

IT Z h9o(i) W h(i),g (i),f(i) = z h,go(:ii) w Ki 1 ),g (i 1 )J(i 1 ) z i2,go{i2) w h(i2),go(i2)J(i2) 



i£K 



i¥=h,i2 



Z h,go(H) W h(i 1 ),g (ii)J'(i2) Z i2,go(i2) W h(i2),go(i2)J'(h) 

■ 11 z i,goW w Hi),g (i)J'(i) 



Z h,go(h) W h(i2),go(i2),f'(i2) Z i2,go(i2) W h(i 1 ),g (ii),f'(i 1 ) 
■ 11 Z i,go(i) W Hi),g (i)J'(i) 



i^il ,12 



z h,go{h) w h{ii),go(ii),f'(h) z i2,9o(i2) w h(i2),go(i2),f'(i2) 

■ II ^.9o(i)^»,ao(i),/'(i) 
ieK 

= II z *,floW u 'fc(<).flo(i),/'(i) • 

We conclude that the inner sum in (|12p evaluates to zero whenever 5 is not polychromatic. 

Combining all of the previous observations, we have that the contribution of the monomial 
x d ^x d 2 ■ ■ ■ xf^ to qk is zero unless xf 1 ^ 2 ■ • • xf? = YiieK x% for some K C {1, 2, . . . , n} with 
I if I = k, in which case the contribution is 

Yj Yj n ^.9«^«-s(i)>/« • 

f;K-+L g:K-+\J ieK S(h(i)) i&K 

f bijective g (i) g S(ft(i)) for all i e K 
g polychromatic 

We observe that from each monomial Ilie^ z i,g(i) w h(i),g(i),f(i) we can uniquely recover K, g, 
and /. Indeed, the indeterminates Zi >s determine both K and g, after which / can be recovered 
from the indeterminates w u ^ Sj £ because g is polychromatic. 

A polychromatic g exists for a if if and only if for all u £ H it holds that n K \ < 

m(u). In particular, viewing q^ as a multivariate polynomial in the indeterminates Zi )S and 
w u,s,e> we have that q^ is not identically zero if and only if C is a YES-instance of the 
constrained multilinear detection problem. Theorem 1111 follows by appying Lemma HI 



